A knowledge-based approach for predicting gene-disease associations
نویسندگان
چکیده
MOTIVATION Recent advances of next-generation sequence technologies have made it possible to rapidly and inexpensively identify gene variations. Knowing the disease association of these gene variations is important for early intervention to treat deadly diseases and provide possible targets to cure these diseases. Genome-wide association studies (GWAS) have identified many individual genes associated with common diseases. To exploit the large amount of data obtained from GWAS studies and leverage our understanding of common as well as rare diseases, we have developed a knowledge-based approach to predict gene-disease associations. We first derive gene-gene mutual information by utilizing the cooccurrence of genes in known gene-disease association data. Subsequently, the mutual information is combined with known protein-protein interaction networks by a boosted tree regression method. RESULTS The method called Know-GENE is compared with the method of random walking on the heterogeneous network using the same input data. For a set of 960 diseases, using the same training data in testing in 3-fold cross-validation, the average recall rate within the top ranked 100 genes by Know-GENE is 65.0% compared with 37.9% by the state of the art random walking on heterogeneous network. This significant improvement is mostly due to the inclusion of knowledge-based mutual information. AVAILABILITY AND IMPLEMENTATION Predictions for genes associated with the 960 diseases are available at http://cssb2.biology.gatech.edu/knowgene CONTACT : [email protected].
منابع مشابه
Predicting Disease-Gene Associations using Cross-Document Graph-based Features
In the context of personalized medicine, text mining methods pose an interesting option for identifying disease-gene associations, as they can be used to generate novel links between diseases and genes which may complement knowledge from structured databases. The most straightforward approach to extract such links from text is to rely on a simple assumption postulating an association between al...
متن کاملExploring Gene Signatures in Different Molecular Subtypes of Gastric Cancer (MSS/ TP53+, MSS/TP53-): A Network-based and Machine Learning Approach
Gastric cancer (GC) is one of the leading causes of cancer mortality, worldwide. Molecular understanding of GC’s different subtypes is still dismal and it is necessary to develop new subtype-specific diagnostic and therapeutic approaches. Therefore developing comprehensive research in this area is demanding to have a deeper insight into molecular processes, underlying these subtypes. In this st...
متن کاملPredicting Behavior and Intention to Knowledge Sharing in Postgraduate Students Based on the Theory of Planned Behavior
Background: Knowledge sharing in university environments is essential and students' behavior in this field is based on their beliefs, norms and attitudes. Theory of planned behavior is one of the most prestigious behavior prediction models that can be used to examine the ideas, values, and attitudes in the context of knowledge sharing behavior. Considering the role of academics...
متن کاملInductive matrix completion for predicting gene–disease associations
MOTIVATION Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the ty...
متن کاملRecent advances in predicting gene–disease associations
Deciphering gene-disease association is a crucial step in designing therapeutic strategies against diseases. There are experimental methods for identifying gene-disease associations, such as genome-wide association studies and linkage analysis, but these can be expensive and time consuming. As a result, various in silico methods for predicting associations from these and other data have been de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 32 18 شماره
صفحات -
تاریخ انتشار 2016